Catriple: Extracting Triples from Wikipedia Categories

نویسندگان

Qiaoling Liu

Kaifeng Xu

Lei Zhang

Haofen Wang

Yong Yu

Yue Pan

چکیده

As an important step towards bootstrapping the Semantic Web, many efforts have been made to extract triples from Wikipedia because of its wide coverage, good organization and rich knowledge. One kind of important triples is about Wikipedia articles and their non-isa properties, e.g. (Beijing, country, China). Previous work has tried to extract such triples from Wikipedia infoboxes, article text and categories. The infobox-based and text-based extraction methods depend on the infoboxes and suffer from a low article coverage. In contrast, the categorybased extraction methods exploit the widespread categories. However, they rely on predefined properties, which is too effort-consuming and explores only very limited knowledge in the categories. This paper automatically extracts properties and triples from the less exploredWikipedia categories so as to achieve a wider article coverage with less manual effort. We manage to realize this goal by utilizing the syntax and semantics brought by super-sub category pairs in Wikipedia. Our prototype implementation outputs about 10M triples with a 12-level confidence ranging from 47.0% to 96.4%, which cover 78.2% of Wikipedia articles. Among them, 1.27M triples have confidence of 96.4%. Applications can on demand use the triples with suitable confidence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Association Rule Mining System for Acquiring Knowledge of DBpedia from Wikipedia Categories

Wikipedia categories are a useful source of knowledge that is usually expressed in a noun-phrase that contains information about concepts of entities or relations among entities. In DBpedia KBs, they categorize their entities into Wikipedia categories using RDF triples. The RDF triples represent only categories of entities, but not concepts of entities or relations among entities despite the fa...

متن کامل

DRETa: Extracting RDF from Wikitables

Tables are widely used in Wikipedia articles to display relational information – they are inherently concise and information rich. However, aside from info-boxe s, there are no automatic methods to exploit the integrated content of these tables. We thus present DRETa: a tool that uses DBpedia as a reference knowledge-base to extract RDF triples from generic Wikipedia tables.

متن کامل

Knowledge Base Augmentation using Tabular Data

Large linked data repositories have been built by leveraging semi-structured data in Wikipedia (e.g., DBpedia) and through extracting information from natural language text (e.g., YAGO). However, the Web contains many other vast sources of linked data, such as structured HTML tables and spreadsheets. Often, the semantics in such tables is hidden, preventing one from extracting triples from them...

متن کامل

Extracting Semantics Relationships between Wikipedia Categories

The Wikipedia is the largest online collaborative knowledge sharing system, a free encyclopedia. Built upon traditional wiki architectures, its search capabilities are limited to title and full-text search. We suggest that semantic information can be extracted from Wikipedia by analyzing the links between categories. The results can be used for building a semantic schema for Wikipedia which cou...

متن کامل

Spectral triples of weighted groups

We study spectral triples on (weighted) groups and consider functors between the categories of weighted groups and spectral triples. We study the properties of weights and the corresponding functor for spectral triples coming from discrete weighted groups.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Catriple: Extracting Triples from Wikipedia Categories

نویسندگان

چکیده

منابع مشابه

The Association Rule Mining System for Acquiring Knowledge of DBpedia from Wikipedia Categories

DRETa: Extracting RDF from Wikitables

Knowledge Base Augmentation using Tabular Data

Extracting Semantics Relationships between Wikipedia Categories

Spectral triples of weighted groups

عنوان ژورنال:

اشتراک گذاری